A Simple Algorithm for Identifying Abbreviation Definitions in Biomedical Text

نویسندگان

  • Ariel S. Schwartz
  • Marti A. Hearst
چکیده

The volume of biomedical text is growing at a fast rate, creating challenges for humans and computer systems alike. One of these challenges arises from the frequent use of novel abbreviations in these texts, thus requiring that biomedical lexical ontologies be continually updated. In this paper we show that the problem of identifying abbreviations' definitions can be solved with a much simpler algorithm than that proposed by other research efforts. The algorithm achieves 96% precision and 82% recall on a standard test collection, which is at least as good as existing approaches. It also achieves 95% precision and 82% recall on another, larger test set. A notable advantage of the algorithm is that, unlike other approaches, it does not require any training data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Proposed System to Identify and Extract Abbreviation Definitions in Spanish Biomedical Texts for the Biomedical Abbreviation Recognition and Resolution (BARR) 2017

Biomedical Abbreviation Recognition and Resolution (BARR) is an evaluation track of the 2nd Human Language Technologies for Iberian languages (IberEval) workshop, which is a workshop series organized by the Sociedad Española del Procesamiento del Lenguaje Natural (SEPLN). In this first edition of BARR, the focus is on the discovery of biomedical entities and abbreviation, and relating detected ...

متن کامل

Alignment-based Extraction of Abbreviations from Biomedical Text

We present an algorithm for extracting abbreviation definitions from biomedical text. Our approach is based on an alignment HMM, matching abbreviations and their definitions. We report 98% precision and 93% recall on a standard data set, and 95% precision and 91% recall on an additional test set. Our results show an improvement over previously reported methods and our model has several advantag...

متن کامل

Alignment-HMM-based Extraction of Abbreviations from Biomedical Text

We present an algorithm for extracting abbreviation definitions from biomedical text. Our approach is based on an alignment HMM, matching abbreviations and their definitions. We report 98% precision and 93% recall on a standard data set, and 95% precision and 91% recall on an additional test set. Our results show an improvement over previously reported methods and our model has several advantag...

متن کامل

Creating an Online Dictionary of Abbreviations from MEDLINE

Design. Our method uses a statistical learning algorithm, logistic regression, to score abbreviation expansions based on their resemblance to a training set of human-annotated abbreviations. We applied it to Medstract, a corpus of MEDLINE abstracts in which abbreviations and their expansions have been manually annotated. We then ran the algorithm on all abstracts in MEDLINE, creating a dictiona...

متن کامل

The Biomedical Abbreviation Recognition and Resolution (BARR) Track: Benchmarking, Evaluation and Importance of Abbreviation Recognition Systems Applied to Spanish Biomedical Abstracts

Healthcare professionals are generating a substantial volume of clinical data in narrative form. As healthcare providers are confronted with serious time constraints, they frequently use telegraphic phrases, domain-specific abbreviations and shorthand notes. Efficient clinical text processing tools need to cope with the recognition and resolution of abbreviations, a task that has been extensive...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

دوره   شماره 

صفحات  -

تاریخ انتشار 2003